Interprocedural optimisation of regular parallel computations at runtime

نویسنده

  • Olav Beckmann
چکیده

This thesis concerns techniques for efficient runtime optimisation of regular parallel programs that are built from separate software components. High-quality, high-performance parallel software is frequently built from separately-written reusable software components such as functions from a library of parallel routines. Apart from the strong case from the software engineering point-of-view for constructing software in such a way, there is often also a large performance benefit in hand-optimising individual, frequently used routines. Hitherto, a problem with such libraries of separate software components has been that there is a performance penalty, both because of invocation and indirection overheads, and because opportunities for cross-component optimisations are missed. The techniques we describe in this thesis aim to reverse this disadvantage by making use of high-level abstract information about the components for performing cross-component optimisation. The key is to specify, generate and make use of metadata which characterise both data and software components, and to take advantage of run-time information. We propose a delayed evaluation, self-optimising (DESO) library of data-parallel numerical routines. Delayed evaluation allows us to capture the control-flow of a user program from within the library at runtime. When evaluation is eventually forced, we construct, at runtime, an optimised execution plan for the computation to be performed by the user program. The fact that our routines are purely data-parallel means that the key optimisation we have to perform is to determine parallel data placements which minimise the overall execution time of the parallel program. A key challenge in optimising at runtime is that optimisation itself must not be allowed to slow down the user program. We describe a range of techniques which permit us to fulfil this requirement: we work from optimised aggregate components which are not re-optimised at runtime; we use carefully constructed mathematical metadata that facilitate efficient optimisation; we re-use previously calculated optimisation results at runtime and we use an incremental optimisation strategy where we invest more optimisation time as optimisation contexts are encountered more frequently. We propose specific algorithms for optimising affine alignment of arrays in data parallel programs, for detecting opportunities to re-use previous optimisation results, and for optimising replication in data parallel programs. We have implemented our techniques in a parallel version of the widely used Basic Linear Algebra Subroutines (BLAS) library, and we provide initial performance results obtained with this library.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Runtime Interprocedural Data Placement Optimisation for Lazy Parallel Libraries (Extended Abstract)

We are developing a lazy, self-optimising parallel library of vector-matrix routines. The aim is to allow users to parallelise certain computationally expensive parts of numerical programs by simply linking with a parallel rather than sequential library of subroutines. The library performs interprocedural data placement optimisation at runtime, which requires the optimiser itself to be very eec...

متن کامل

Eecient Interprocedural Data Placement Optimisation in a Parallel Library

This paper describes a combination of methods which make interprocedural data placement optimisation available to parallel libraries. We propose a delayed-evaluation, self-optimising (DESO) numerical library for a distributed-memory multicomputer. Delayed evaluation allows us to capture the control-ow of a user program from within the library at runtime, and to construct an optimised execution ...

متن کامل

Efficient Interprocedural Data Placement Optimisation in a Parallel Library

This paper describes a combination of methods which make interprocedural data placement optimisation available to parallel libraries. We propose a delayed-evaluation, self-optimising (DESO) numerical library for a distributed-memory multicomputer. Delayed evaluation allows us to capture the control-ow of a user program from within the library at runtime, and to construct an optimised execution ...

متن کامل

Experiments with Parallelising Numerical Applications via Desolibraries (extended Abstract)

DESOLibraries are \delayed evaluation, self-optimising" parallel libraries of numerical routines. The aim is to allow users to parallelise computationally expensive parts of numerical programs by simply linking with a parallel rather than sequential library of subroutines. The library performs interprocedural data placement optimisation at runtime, which requires the op-timiser itself to be ver...

متن کامل

Language Extensions and Compilation Techniques for Data Intensive Computations

Processing and analyzing large volumes of data plays an increasingly important role in many domains of scienti c research. Typical examples of very large scienti c datasets include long running simulations of time-dependent phenomena that periodically generate snapshots of their state, archives of raw and processed remote sensing data, and archives of medical images. High-level language and com...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001